Workshop #2

Recent Developments in Heterogeneous System Integration and Architecture

"Challenges and Opportunities in Emerging Memory for AI accelerators"

Prof. Boris MURMANN (Stanford)

Abstract

Emerging memory technologies such as RRAM and MRAM have been actively researched over the past decade and hold great promise for dense, non-volatile weight storage in machine learning accelerators. However, most of the prototypes developed so far do not deliver leading-edge performance and energy efficiency. This presentation investigates the underlying reasons and proposes a path forward, specifically focusing on RRAM and its interface circuit limitations.

References

B. Murmann, “Mixed-Signal Computing for Deep Neural Network Inference,” IEEE Trans. Very Large Scale Integration (VLSI) Systems, vol. 29, no. 1, pp. 3-13, Jan. 2021.

Prof. Boris MURMANN

"CNN Pruning and Performance Optimization on GPU based on History Data"

Prof. Wei ZHANG (HKUST)

Abstract

As Deep Neural Networks (DNNs) become increasingly popular, there is a growing trend to accelerate the DNN applications on hardware platforms like GPUs, to gain higher performance and efficiency. However, it is time-consuming to design the hardware efficient architectures and tune the performance due to the strong background requirements in hardware details, large design space, and the expensive cost to evaluate each design point. In this work, we propose a comprehensive framework for DNN design optimization on GPU. We automatically generate the design according to the templates and the corresponding parameters combinations. A novel transfer learning and Guided Genetic Algorithm (GGA) based framework are proposed to speed up the hardware tuning process. Our experiments show that we can achieve superior performance than the state-of-the-art work, such as auto-tuning framework TVM and the handcraft optimized library cuDNN, while reducing the searching time by 8.96x and 4.58x compared with the XGBoost tuner and GA tuner in TVM.

We also dive into auto-pruning to achieve more compact neural networks to ease the hardware constraints. We observe that the widely used reinforcement learning (RL) algorithm has become the timing performance bottleneck of the auto-pruning process. Therefore, we propose a framework which can significantly accelerate the RL algorithm by taking advantage of the pruning history in other pruning scenarios. The experiments have shown that our framework can accelerate the auto-pruning process by 1.5 ~ 2.5x for ResNet-20, and 1.81 ~ 2.375x for other neural networks like ResNet-56, ResNet-18, and MobileNet v1.

Prof. Wei ZHANG

Back

"Trend of SRAM-based CIM Design"

Prof. Chi Ying TSUI (HKUST)

Abstract

Executing AI computation on conventional von Neumann architectures induces a huge amount of data movements between memory and processing units, which increases the latency and consumes a lot of energy. To tackle this issue, computing-in-memory (CIM) architectures are proposed where computations are executed within the storage cells to eliminate the expensive data movement between the memory and processing units. Emerging memory such as ReRAM has been proposed for CIM applications, but there are still many technical hurdles to overcome before they can be deployed in real applications. Using mature memory technology such as SRAM, thus becomes more appealing. In this talk, the current trend of designing SRAM based CIM architectures for AI applications will be presented. Different schemes such as current-based, charge-based and time-based techniques will be discussed.

Prof. Chi Ying TSUI